-
Notifications
You must be signed in to change notification settings - Fork 99
add synchronization log event #1896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
This looks like it is only useful for Additionally, it seems like this would add some additional noise to the profiler output for executors where |
|
I am not big fan to provide this when we can use the vendor profiler directly. |
|
Let me be a bit more precise: The noise is the biggest issue to me. If the executor doesn't do anything when synchronizing, it also shouldn't show up in the output. Looking at where we actually call |
|
Asynchronous is different topic. |
This PR adds the event to synchronize.
I put event in executor_event collection and operations tag in profile_hook.
To distinguish
profiler_hookadditional sync and function sync, I allow profiler hook to callsynchronize_implsuch that they do not have usual syncrhonize event.BTW, this PR is not enough such that we can distinguish the real copy time from waiting the previous gpu activity because the result in residual_norm copy is pageable memory unless we are on grace hopper or use
cudaMallocHostfor the resulting memory, which will handled in another PR. According to https://docs.nvidia.com/cuda/cuda-runtime-api/api-sync-behavior.html#api-sync-behavior__memcpy-async, it might sync with respect to host.In practice, the current behavior synchronizes during the
cudaMemcpyAsync, so we have two sync - copy and explicit sync actually. Unfortunately, we can not eliminate the sync because cuda only mentions it "might" sync.